Sliding Window-based Speech-to-Lips Conversion with Low Delay
نویسندگان
چکیده
The goal of a good speech-to-lips conversion system is to synthesize high quality, realistic lips movement which is time synchronized with the input speech. Previously, the maximum probability estimation of visual trajectory by Gaussian Mixture Model (GMM) has been successfully proposed and tested for speech-to-lips conversion. It works as a sentence level batch process that convert acoustic speech signals to visual lips movement trajectory. In this paper, we propose a moving window based, low delay speech-to-lips conversion method for real-time communication applications. The new approach is an approximation of the MLE-GMM conversion but can render lips movement on-the-fly with a low time latency. Experimental results on the LIPS2009 dataset shows that proposed real-time method can achieve a latency of less than 100ms while maintain comparable quality as the batch method.
منابع مشابه
Speech difficulties in Joubert syndrome
Introduction: "Joubert syndrome" was first introduced in1969. This syndrome is a rare genetic disease with autosomal dominantpattern. Hypotonia, ataxia and motor delay of the disease known as clinical manifestations. In the few reports of this syndrome, mostly functional and structural components studied and radiographic images such as speech and language developmental delay symptoms has been l...
متن کاملFDiBC: A Novel Fraud Detection Method in Bank Club based on Sliding Time and Scores Window
One of the recent strategies for increasing the customer’s loyalty in banking industry is the use of customers’ club system. In this system, customers receive scores on the basis of financial and club activities they are performing, and due to the achieved points, they get credits from the bank. In addition, by the advent of new technologies, fraud is growing in banking domain as well. Therefor...
متن کاملAudio-to-Visual Speech Conversion Using Deep Neural Networks
We study the problem of mapping from acoustic to visual speech with the goal of generating accurate, perceptually natural speech animation automatically from an audio speech signal. We present a sliding window deep neural network that learns a mapping from a window of acoustic features to a window of visual features from a large audio-visual speech dataset. Overlapping visual predictions are av...
متن کاملMultipath Communication with Finite Sliding Window Network Coding for Ultra-Reliability and Low Latency
We use random linear network coding (RLNC) based scheme for multipath communication in the presence of lossy links with different delay characteristics to obtain ultra-reliability and low latency. A sliding window version of RLNC is proposed where the coded packets are generated using packets in a window size and are inserted among systematic packets in different paths. The packets are schedule...
متن کاملA minimum converted trajectory error (MCTE) approach to high quality speech-to-lips conversion
High quality speech-to-lips conversion, investigated in this work, renders realistic lips movement (video) consistent with input speech (audio) without knowing its linguistic content. Instead of memoryless framebased conversion, we adopt maximum likelihood estimation of the visual parameter trajectories using an audio-visual joint Gaussian Mixture Model (GMM). We propose a minimum converted tra...
متن کامل